Goto

Collaborating Authors

 rmsprop 0


Appendices

Neural Information Processing Systems

And, for each of them, the second (final) stripe has 44 options. It could seem that small improvements in efficacy may have only a minor effect on final network accuracy, especially considering the noisiness inherent in large-scale training. Better thanreducing themagnitude oflostweights, though, iscompletely eliminating it - by using the zeros already present in the unstructured sparse weight matrix, it may be possible to find a permutation that does notloseanymagnitude after applying theN:M constraint.



Appendix for Integrating Momentum into Recurrent Neural Networks

Neural Information Processing Systems

Section 3.1, we flatten and process the image as a sequence of the length of 784 pixel-by-pixel. The baseline LSTM models consist of one LSTM cell with 128 and 256 hidden units. Orthogonal initialization is used for input-to-hidden weights, while hidden-to-hidden weights are initialized to identity matrices. The gradient norms are clipped to 1 during training. The log-magnitude of these sequences is fed into the models as the input data.


Gradient Flow Matching for Learning Update Dynamics in Neural Network Training

Shou, Xiao, Ding, Yanna, Gao, Jianxi

arXiv.org Machine Learning

Training deep neural networks remains computationally intensive due to the itera2 tive nature of gradient-based optimization. We propose Gradient Flow Matching (GFM), a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned optimizer-aware vector fields. By leveraging conditional flow matching, GFM captures the underlying update rules of optimizers such as SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence. Unlike black-box sequence models, GFM incorporates structural knowledge of gradient-based updates into the learning objective, facilitating accurate forecasting of final weights from partial training sequences. Empirically, GFM achieves forecasting accuracy that is competitive with Transformer-based models and significantly outperforms LSTM and other classical baselines. Furthermore, GFM generalizes across neural architectures and initializations, providing a unified framework for studying optimization dynamics and accelerating convergence prediction.


Understanding Optimization in Deep Learning with Central Flows

Cohen, Jeremy M., Damian, Alex, Talwalkar, Ameet, Kolter, Zico, Lee, Jason D.

arXiv.org Machine Learning

Optimization in deep learning remains poorly understood, even in the simple setting of deterministic (i.e. full-batch) training. A key difficulty is that much of an optimizer's behavior is implicitly determined by complex oscillatory dynamics, referred to as the "edge of stability." The main contribution of this paper is to show that an optimizer's implicit behavior can be explicitly captured by a "central flow:" a differential equation which models the time-averaged optimization trajectory. We show that these flows can empirically predict long-term optimization trajectories of generic neural networks with a high degree of numerical accuracy. By interpreting these flows, we reveal for the first time 1) the precise sense in which RMSProp adapts to the local loss landscape, and 2) an "acceleration via regularization" mechanism, wherein adaptive optimizers implicitly navigate towards low-curvature regions in which they can take larger steps. This mechanism is key to the efficacy of these adaptive optimizers. Overall, we believe that central flows constitute a promising tool for reasoning about optimization in deep learning.


Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset

Himel, Galib Muhammad Shahriar, Islam, Md. Masudul

arXiv.org Artificial Intelligence

As the most basic application and implementation of deep learning, image classification has grown in popularity. Various datasets are provided by renowned data science communities for benchmarking machine learning algorithms and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is being used in this research for its overall acceptance and benchmark standards. A comparison of various pre-trained models is demonstrated by using different types of optimizers and loss functions. Hyper-parameters are changed to gain the best result from a model. By applying this approach, we have got higher accuracy without major changes in the training model. To run the experiment, we used three different computer architectures: a laptop equipped with NVIDIA GeForce GTX 1070, a laptop equipped with NVIDIA GeForce RTX 3080Ti, and a desktop equipped with NVIDIA GeForce RTX 3090. The acquired results demonstrate supremacy in terms of accuracy over the previously done experiments on this dataset. From this experiment, the highest accuracy which is 99.65% is gained using the NASNet Large.


Realistic mask generation for matter-wave lithography via machine learning

Fiedler, Johannes, Palau, Adrià Salvador, Osestad, Eivind Kristen, Parviainen, Pekka, Holst, Bodil

arXiv.org Artificial Intelligence

Fast production of large area patterns with nanometre resolution is crucial for the established semiconductor industry and for enabling industrial-scale production of next-generation quantum devices. Metastable atom lithography with binary holography masks has been suggested as a higher resolution/low-cost alternative to the current state of the art: extreme ultraviolet (EUV) lithography. However, it was recently shown that the interaction of the metastable atoms with the mask material (SiN) leads to a strong perturbation of the wavefront, not included in existing mask generation theory, which is based on classical scalar waves. This means that the inverse problem (creating a mask based on the desired pattern) cannot be solved analytically even in 1D. Here we present a machine learning approach to mask generation targeted for metastable atoms. Our algorithm uses a combination of genetic optimisation and deep learning to obtain the mask. A novel deep neural architecture is trained to produce an initial approximation of the mask. This approximation is then used to generate the initial population of the genetic optimisation algorithm that can converge to arbitrary precision. We demonstrate the generation of arbitrary 1D patterns for system dimensions within the Fraunhofer approximation limit.


ConformalLayers: A non-linear sequential neural network with associative layers

Sousa, Eduardo Vera, Fernandes, Leandro A. F., Vasconcelos, Cristina Nader

arXiv.org Artificial Intelligence

Convolutional Neural Networks (CNNs) have been widely applied. But as the CNNs grow, the number of arithmetic operations and memory footprint also increase. Furthermore, typical non-linear activation functions do not allow associativity of the operations encoded by consecutive layers, preventing the simplification of intermediate steps by combining them. We present a new activation function that allows associativity between sequential layers of CNNs. Even though our activation function is non-linear, it can be represented by a sequence of linear operations in the conformal model for Euclidean geometry. In this domain, operations like, but not limited to, convolution, average pooling, and dropout remain linear. We take advantage of associativity to combine all the "conformal layers" and make the cost of inference constant regardless of the depth of the network.


Classification and Feature Transformation with Fuzzy Cognitive Maps

Szwed, Piotr

arXiv.org Artificial Intelligence

Fuzzy Cognitive Maps (FCMs) are considered a soft computing technique combining elements of fuzzy logic and recurrent neural networks. They found multiple application in such domains as modeling of system behavior, prediction of time series, decision making and process control. Less attention, however, has been turned towards using them in pattern classification. In this work we propose an FCM based classifier with a fully connected map structure. In contrast to methods that expect reaching a steady system state during reasoning, we chose to execute a few FCM iterations (steps) before collecting output labels. Weights were learned with a gradient algorithm and logloss or cross-entropy were used as the cost function. Our primary goal was to verify, whether such design would result in a descent general purpose classifier, with performance comparable to off the shelf classical methods. As the preliminary results were promising, we investigated the hypothesis that the performance of $d$-step classifier can be attributed to a fact that in previous $d-1$ steps it transforms the feature space by grouping observations belonging to a given class, so that they became more compact and separable. To verify this hypothesis we calculated three clustering scores for the transformed feature space. We also evaluated performance of pipelines built from FCM-based data transformer followed by a classification algorithm. The standard statistical analyzes confirmed both the performance of FCM based classifier and its capability to improve data. The supporting prototype software was implemented in Python using TensorFlow library.


Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?

Grosnit, Antoine, Cowen-Rivers, Alexander I., Tutunov, Rasul, Griffiths, Ryan-Rhys, Wang, Jun, Bou-Ammar, Haitham

arXiv.org Machine Learning

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.